A measure of variance for hierarchical nominal attributes

نویسندگان

  • Josep Domingo-Ferrer
  • Agusti Solanas
چکیده

The need for measuring the dispersion of nominal categorical attributes appears in several applications, like clustering or data anonymization. For a nominal attribute whose categories can be hierarchically classified, a measure of the variance of a sample drawn from that attribute is proposed which takes the attribute’s hierarchy into account. The new measure is the reciprocal of ‘‘consanguinity”: the less related the nominal categories in the sample, the higher the measured variance. For non-hierarchical nominal attributes, the proposed measure yields results consistent with previous diversity indicators. Applications of the new nominal variance measure to economic diversity measurement and data anonymization are also discussed. ! 2008 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Group Compromise Ranking Methodology Based on Euclidean–Hausdorff Distance Measure Under Uncertainty: An Application to Facility Location Selection Problem

Proposing a hierarchical group compromise method can be regarded as a one of major multi-attributes decision-making tool that can be introduced to rank the possible alternatives among conflict criteria. Decision makers’ (DMs’) judgments are considered as imprecise or fuzzy in complex and hesitant situations. In the group decision making, an aggregation of DMs’ judgments and fuzzy group compromi...

متن کامل

Anonymization of nominal data based on semantic marginality

Nominal attributes are very common in data sets about individuals, specifically medical data like patient healthcare records. Attributes of this type tend to be sensitive due to their personal nature. If public-use data sets need to be released, e.g. for clinical research purposes, data should be first anonymized. However, since most anonymization methods omit data semantics when dealing with n...

متن کامل

Dissimilarity learning for nominal data

Defining a good distance (dissimilarity) measure between patterns is of crucial importance in many classification and clustering algorithms. While a lot of work has been performed on continuous attributes, nominal attributes are more difficult to handle. A popular approach is to use the value difference metric (VDM) to define a real-valued distance measure on nominal values. However, VDM treats...

متن کامل

Marginality: A Numerical Mapping for Enhanced Exploitation of Taxonomic Attributes

Hierarchical attributes appear in taxonomic or ontologybased data (e.g. NACE economic activities, ICD-classified diseases, animal/plant species, etc.). Such taxonomic data are often exploited as if they were flat nominal data without hierarchy, which implies losing substantial information and analytical power. We introduce marginality, a numerical mapping for taxonomic data that allows using on...

متن کامل

New Ant Colony Optimisation Algorithms for Hierarchical Classification of Protein Functions

Ant colony optimisation (ACO) is a metaheuristic to solve optimisation problems inspired by the foraging behaviour of ant colonies. It has been successfully applied to several types of optimisation problems, such as scheduling and routing, and more recently for the discovery of classification rules. The classification task in data mining aims at predicting the value of a given goal attribute fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 179  شماره 

صفحات  -

تاریخ انتشار 2008